Loading up!

library(bayesrules)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(rstan)
## Loading required package: StanHeaders
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## 
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
## 
##     extract
library(rstanarm)
## Loading required package: Rcpp
## This is rstanarm version 2.21.1
## - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
## - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
## - For execution on a local, multicore CPU with excess RAM we recommend calling
##   options(mc.cores = parallel::detectCores())
## 
## Attaching package: 'rstanarm'
## The following object is masked from 'package:rstan':
## 
##     loo
library(bayesplot)
## This is bayesplot version 1.8.1
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
##    * Does _not_ affect other ggplot2 plots
##    * See ?bayesplot_theme_set for details on theme setting
library(tidybayes)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(broom.mixed)

Question 9.1: Normal Regression Priors

For the normal regression model from the chapter:

  1. Why is a normal prior a reasonable choice for beta_0 and beta_1?

Values for the betas can be any value on the real number line. A normal model can also include values on the real number line, so this is an appropriate prior model.

  1. Why isn’t a Normal prior a reasonable choice for sigma?

Sigma values must be positive, thus excluding many of the values that are possible for a normal model, making that an inappropriate choice. An exponential model goes from 0 to infinity which makes it a better choice.

  1. What’s the difference between weakly informative and vague priors?

Weakly informative priors take default values and are scaled according to the data to make sure the values are of the right order of magnitude. They give some direction for the Bayesian model and make sure values are sensible for the model. Vague priors would be based on our own intuition of what a prior model might look like, not based on the data. This is much more hit or miss and could result in unstable simulations if our vague priors are way off the mark.

Question 9.2: Identify the variable

Identify the response variable (Y) and predictor variable (X) in each given relationship of interest.

  1. We want to use a person’s arm length to understand their height

Y = height

X = arm length

  1. We want to predict a person’s carbon footprint with the distance between their home and work

Y = carbon footprint

X = distance between home and work

  1. We want to understand how a child’s vocabulary level might increase with age

Y = vocabulary level

X = age

  1. We want to use info about a person’s sleep habits to predict their reaction time

Y = reaction time

X = sleep habits

Question 9.3: Interpreting coefficients

Suppose that the typical relationship between the given response variable Y and predictor X can be described by \(\beta_0 + \beta_1X\). Interpret the meaning of beta_0 and beta_1 and indicate whether your prior understanding suggests that beta_1 is negative or positive:

  1. Y = height in cm of a baby kangaroo, X = its age in months

\(\beta_0\) is the typical height of baby kangaroos when they are 0 months old. \(\beta_1\) is the change in height of baby kangaroos for every one unit change in kangaroos’ age (i.e., one month).

I would expect height to increase with age so that \(\beta_1\) would be positive. Assuming this is only for baby kangaroos. If we’re looking at kangaroos at large, I would expect there to be a positive \(\beta_1\) with a drop off after a certain age where they are full grown.

  1. Y = a data scientist’s number of GitHub followers, X = their number of GitHub commits in the past week

\(\beta_0\) is the typical number of GitHub followers for data scientists with 0 commits in the past week. \(\beta_1\) is the change in GitHub followers for data scientists for every one unit change in number of commits in the past week.

Honestly…I might think that this relationship would be very weakly positive if not zero…but maybe that’s just because I don’t browse on GitHub. Optimistically speaking, I will say that this will produce a positive \(\beta_1\) value.

  1. Y = number of visitors to a local park on a given day, X = the rainfall in inches on that day

\(\beta_0\) is the typical number of visitors to a local park on days where there is 0 rainfall. \(\beta_1\) is the change in visitors to the park for every one unit change in inches of rainfall.

I would expect \(\beta_1\) to be negative. That is, when the rainfall increases, the number of visitors decreases

  1. Y = the daily hours of Netflix that a person watches, X = the typical number of hours that they sleep

\(\beta_0\) is the typical hours of Netflix a person watches when they sleep zero hours. \(\beta_1\) is the change in hours of Netflix watching for every one unit change in hours of sleep.

I would expect \(\beta_1\) to be negative. That is, when people get more sleep, they watch less Netflix. Because the hourly decision people make is always between sleep and Netflix! (at least beyond work hours, it might be)

Question 9.4: Deviation from the average

Explain in one or two sentences how sigma is related to the strength of the relationship between a response variable Y and predictor X.

Sigma measure the variability from the local mean on days with similar temperatures. If the sigma is small, this means that there is not a lot of variability from the mean of our model, which means that X is a strong predictor of our response variable Y.

Question 9.5: Bayesian model building - part I

A researcher wants to use person’s age (in years) to predict their annual orange juice consumption (in gallons). Here you’ll build a relevant Bayesian regression model:

  1. Identify the Y and X variables

Y = annual orange juice consumption (in gallons)

X = age (in years)

  1. Use math notation to specify an appropriate structure for the model of Y data

I’m going to assume a normal distribution for the response variable, Y. With this, I will specify the normal model specifying both mu and sigma:

\(Y_i | \mu \text{ ~ } N(\mu, \sigma^2)\)

\(\mu \text{ ~ } N(\theta,\tau^2)\)

\(\sigma \text{ ~ some prior model}\)

  1. Rewrite the structure of the data model to incorporate info about the predictor. Assume a linear relationship between X and Y

I will assume a linear relationship and focus on a local mean, giving the local mean of a specific age X_i

\(\mu_i = \beta_0 + \beta_1 X_i\)

Putting this together with the above, we get a re-write:

\(Y_i|\beta_0,\beta_i,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1X_i\)

  1. Identify all unknown parameters in your model. Indicate the values the parameter can take:

The unknowns are beta_0, beta_1 and sigma. Beta_0 is the typical annual orange juice consumption for people aged 0. Beta_1 is the change in annual orange juice consumption for each unit increase in age. Beta_0 and beta_1 can both take values that are real numbers. Sigma is the variability from the local mean for people of the same age. This can take values from 0 to infinity

Given the above, we know that betas can assume values that are on the real number line. This is consistent with normal models for the beta parameters. For sigma, given that it takes values from 0 to infinity, we can use an exponential model.

  1. Identify and tune suitable prior models for your parameters and explain rationale.

To guess at potential reasonable values for this I can take some assumptions. First I will limit this to D.C. and assume that the average age of the population is about 40. I think that:

Those are all specified below:

\(\beta_0 \text{ ~ } N(15,5^2)\)

\(\beta_1 \text{ ~ } N(0.5,0.25^2)\)

\(\sigma \text{ ~ } Exp(0.2)\)

Question 9.6: Bayesian model building - part II

Repeat the above exercise for the following scenario: a researcher wishes to predict tomorrow’s high temperature by today’s high temperature

  1. Identify the Y and X variables

Y = tomorrow’s high temperature

X = today’s high temperature

  1. Use math notation to specify an appropriate structure for the model of Y data

I’m going to assume a normal distribution for the response variable, Y. With this, I will specify the normal model specifying both mu and sigma:

\(Y_i | \mu \text{ ~ } N(\mu, \sigma^2)\)

\(\mu \text{ ~ } N(\theta,\tau^2)\)

\(\sigma \text{ ~ some prior model}\)

  1. Rewrite the structure of the data model to incorporate info about the predictor. Assume a linear relationship between X and Y

I will assume a linear relationship and focus on a local mean, giving the local mean of a specific age X_i

\(\mu_i = \beta_0 + \beta_1 X_i\)

Putting this together with the above, we get a re-write:

\(Y_i|\beta_0,\beta_i,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1X_i\)

  1. Identify all unknown parameters in your model. Indicate the values the parameter can take:

The unknowns are beta_0, beta_1 and sigma. Beta_0 is the typical high temperature tomorrow for a high temperature of 0 today. Beta_1 is the change in tomorrow’s high temperature for each unit increase in today’s high temperature. Beta_0 and beta_1 can both take values that are real numbers. Sigma is the variability from the local mean for today’s of the same temperature. This can take values from 0 to infinity.

Given the above, we know that betas can assume values that are on the real number line. This is consistent with normal models for the beta parameters. For sigma, given that it takes values from 0 to infinity, we can use an exponential model.

  1. Identify and tune suitable prior models for your parameters and explain rationale.

Here I can name a few assumptions that will help me create suitable prior models. First, again, I will assume this is DC we’re talking about:

All of these assumptions produce the following prior model:

\(\beta_0 \text{ ~ } N(65,2.5^2)\)

\(\beta_1 \text{ ~ } N(1,0.5^2)\)

\(\sigma \text{ ~ } Exp(0.333)\)

Question 9.7: Posterior simulation T/F

Mark each statement about posterior regression simulation as True or False:

  1. MCMC provides the exact posterior model of our regression parameters \((\beta_0,\beta_1,\sigma)\)

False

  1. MCMC allows us to avoid complicated mathematical derivations

True

Question 9.8: Posterior simulation

For each situation, specify the appropriate stan_glm() syntax for simulating the Normal regression model using 4 chains, each of length 10000

  1. X = age; Y = height; dataset: bunnies
bunny_model <- stan_glm(height ~ age, data = bunnies,
                       family = gaussian,
                       prior_intercept = normal(_mean_,2.5,autoscale=TRUE),
                       prior=normal(0,2.5,autoscale = TRUE),
                       prior_aux = exponential(1,autoscale=TRUE),
                       chains=4,iter=5000*2,seed=369)
  1. \(\text{Clicks}_i | \beta_0,\beta_1,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1\text{Snaps}_i\) with data set: songs
songs_model <- stan_glm(clicks ~ snaps, data = songs,
                       family = gaussian,
                       prior_intercept = normal(_mean_,2.5,autoscale=TRUE),
                       prior=normal(0,2.5,autoscale = TRUE),
                       prior_aux = exponential(1,autoscale=TRUE),
                       chains=4,iter=5000*2,seed=369)
  1. \(\text{Happiness}_i | \beta_0,\beta_1,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1\text{Age}_i\) with data set: dogs
dogs_model <- stan_glm(happiness ~ age, data = dogs,
                       family = gaussian,
                       prior_intercept = normal(_mean_,2.5,autoscale=TRUE),
                       prior=normal(0,2.5,autoscale = TRUE),
                       prior_aux = exponential(1,autoscale=TRUE),
                       chains=4,iter=5000*2,seed=369)

Question 9.9: How humid is too humid - model building

Explore the Normal regression model of rides (Y) by humidity (X) using the bikes dataset. Based on past analysis, suppose we have the following prior understanding:

  1. Tune the Normal regression model to match our prior understanding. Use notation to write out the complete Bayesian structure of this model

Prior assumption 1 centers on the average humidity day. We can use a normal model to tune the prior on this.

\(\beta_{0c} \text{ ~ } N(5000,sd^2)\)

Looking at patterns from the chapter, it loos like determining the variance and sd values relies on the 2 standard deviation rule of \(\mu +/- 2\sigma\). For this spread of 1000 and 9000 centered on 5000, this would make sd = 2000. I’ll plot it to see:

plot_normal(5000,2000)

This seems reasonable, so will give us a normal of:

\(\beta_{0c} \text{ ~ } N(5000,2000^2)\)

Next, we can tune the beta_1 value. \(\beta_1 \text{ ~ } N(-10,sd^2)\)

This value is negative because we are looking at decreases in ridership for increases in humidity.

For the sd, again, we can use the same formula relating a reasonable sd to the mu. Here we have range of 0 to 20 centered on 10, which would make sd = 5.

plot_normal(-10,5)

\(\beta_1 \text{ ~ } N(-10,5^2)\)

Finally, we’ll check the sigma. We use the mean formula for exponential models and the sd of 2000 rides:

\(E(\sigma) = \frac{1}{l} = 2000 \text{ rides}\), this a rate of \(l = \frac{1}{2000} = 0.0005: \sigma \text{ ~ } Exp(0.0005)\)

Putting all of this together, we get a Bayesian structure of:

The Bayesian regression model is specified as shown:

data:

\(Y_i|\beta_0,\beta_i,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1X_i\)

priors:

\(\beta_{0c} \text{ ~ } N(5000,2000^2)\)

\(\beta_1 \text{ ~ } N(-10,5^2)\)

\(\sigma \text{ ~ } Exp(0.0005)\)

  1. To explore the combined prior understanding, simulate the normal regression with 5 chains run for 8000 iterations each
#loading the data
data("bikes")

#running the regression prior model
humid_model <- stan_glm(rides ~ humidity, data = bikes,
                        family = gaussian,
                        prior_intercept = normal(5000,2000),
                        prior = normal(-10,5),
                        prior_aux = exponential(0.0005),
                        prior_PD = TRUE,
                        chains = 5, iter = 4000*2, seed=369)

Let me do some quick math checks:

neff_ratio(humid_model)
## (Intercept)    humidity       sigma 
##     0.77735     0.76110     0.87910
rhat(humid_model)
## (Intercept)    humidity       sigma 
##   0.9998639   1.0000464   1.0001088

These are all looking good with Neff of over 0.1 and near 1 and the rhat values near one and less than 1.05.

  1. Plot 100 prior plausible model lines and 4 data sets under the priors:
#plot prior model lines

bikes |> 
  add_fitted_draws(humid_model,n=100) |> 
  ggplot(aes(x=humidity,y=rides)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)+
  geom_point(data=bikes,size=0.05)
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

# 4 data sets

bikes |> 
  add_predicted_draws(humid_model,n=4) |> 
  ggplot(aes(x=humidity,y=riders)) +
  geom_point(aes(y=.prediction,group=.draw),size=0.2) +
  facet_wrap(~ .draw)
## Warning: 
## In add_predicted_draws(): The `n` argument is a deprecated alias for `ndraws`.
## Use the `ndraws` argument instead.
## See help("tidybayes-deprecated").

  1. Describe our overall prior understanding of the relationship between ridership and humidity

These plots show us that humidity and rides is very very very loosely related to ridership, if at all. There isn’t really clear convergence on one slope, though many look vaguely negative. There seems to be a lot of variability. The data set plots show that 3 of the simulated plots have a line that looks close to zero while the fourth plot shows extremely high spread between the data points.

Question 9.10: how humid is too humid - data

With the priors in place, let’s examine the data

  1. Plot and discuss the observed relationship between ridership and humidity in the bikes data:
#plot data

ggplot(bikes,aes(x=humidity,y=rides)) +
  geom_point(size=0.75)+
  geom_smooth(method="lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

Looking at this data, this appears to show a very weak negative relationship between rides and humidity.

  1. Does simple Normal regression seem to be a reasonable approach to modeling the relationship?

I have questions if this is a reasonable approach. There seems to be a fairly linear relationship between these two variables, however the data gives a large blob without a clear shape, whereas other data relationship’s we’ve looked at does appear to have a vaguely linear shape. Furthermore, there has to be some plateau of rides for all of this data where the X variables will stop having any effect on the response variable due to factors like number of citizens in the set and limits to how many rides you could possibly take.

Question 9.11 - how humid is too humid - posterior simulation

We can now simulate the posterior model with our prior model and the data

  1. Use stan_glm() to simulate the Normal regression posterior model. Do this with 5 chains run for 8000 iterations each.

To do this, we take the same data as above without the prior_PD off:

humid_model_post <- stan_glm(rides ~ humidity, data = bikes,
                        family = gaussian,
                        prior_intercept = normal(5000,2000),
                        prior = normal(-10,5),
                        prior_aux = exponential(0.0005),
                        chains = 5, iter = 4000*2, seed=369)
  1. Perform and discuss some MCMC diagnostics to determine if we can trust these

First I’ll look at some quant diagnostics:

neff_ratio(humid_model_post)
## (Intercept)    humidity       sigma 
##      1.0141      1.0055      1.0017
rhat(humid_model_post)
## (Intercept)    humidity       sigma 
##   1.0000338   0.9999086   0.9999706

These values all look good with Neff over 1 and rhat near 1 and less than 1.05.

Then I will look at some visual diagnostics

#trace plots
mcmc_trace(humid_model_post)

#dens plot
mcmc_dens_overlay(humid_model_post)

These look like they are all mixing well, nothing stands out.

  1. Plot 100 posterior model lines for the relationship between ridership and humidity. Compare and contrast them to the prior model lines:
#plot posterior model lines

bikes |> 
  add_fitted_draws(humid_model_post,n=100) |> 
  ggplot(aes(x=humidity,y=rides)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)+
  geom_point(data=bikes,size=0.05) +
  labs(title = "Posterior Humidity + Rides Model")
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

This gives a much more converged line for the relationship between rides and humidity than the prior simulations did. However, there appears to be both negative slope lines and lines that could be zero within this 100 plot simulation.

Question 9.12: How humid is too humid - posterior interpretation

  1. Provide a tidy() summary of your posterior model, including 95% credible interval
#posterior summary stats
tidy(humid_model_post,effects=c("fixed","aux"),
     conf.int=TRUE,conf.level=0.95)
## # A tibble: 4 × 5
##   term        estimate std.error conf.low conf.high
##   <chr>          <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)  4018.      223.     3586.    4463.  
## 2 humidity       -8.42      3.36    -15.1     -1.90
## 3 sigma        1574.       49.8    1482.    1677.  
## 4 mean_PPD     3484.      101.     3288.    3683.
  1. Interpret the posterior median value of the sigma parameter

The estimated sigma has a posterior median of 1573.6. This means that on average we can expect that the observed ridership on a given day will fall 1574 rides from the average ridership on days of the same humidity.

  1. Interpret the 95% posterior credible interval for the humidity coefficient \(\beta_1\)

The CI for \(\beta_1\) is (-15.1, -1.9). This means that there is 95% certainty that for every unit increase in humidity, the change in rides is between -15.1 and -1.9.

  1. Do we have ample posterior evidence that there’s a negative association between ridership and humidity?

No, the CI captures only values below 0, but the visual seems to include potential 0 and positive lines. Finally we can look at some numbers:

# store the chains in a dataframe
humid_model_df <- as.data.frame(humid_model_post)

#tabulate the beta_1 values that are less than 0

humid_model_df |> 
  mutate(under_0 = humidity < 0) |> 
  tabyl(under_0)
##  under_0     n percent
##    FALSE   112  0.0056
##     TRUE 19888  0.9944

This shows that there are some values that fall above 0.

Question 9.13: humid - prediction

Tomorrow is supposed to be 90% humidity in DC. What levels of ridership do we expect?

  1. Without using posterior_predict(), simulate two posterior models:

We can do these in the same code chunk:

#predict rides for each parameter set in the chain

set.seed(369)
predict_90 <- humid_model_df |> 
  mutate(mu = `(Intercept)` + humidity*90,
         y_new = rnorm(20000,mean=mu,sd=sigma))

head(predict_90,3)
##   (Intercept)  humidity    sigma       mu    y_new
## 1    3699.198  -3.58787 1531.095 3376.290 2226.984
## 2    4178.352 -11.36117 1618.299 3155.846 4608.517
## 3    4181.567 -11.33917 1570.467 3161.042 4330.981

The mu value provides the posterior for a typical number of riders on 90% days whereas the y_new provides the predictive model for the number of riders tomorrow.

  1. Construct, discuss, and compare density plots for the posterior models above:
#plot the posterior model of the typical riders on 90% humidity days
ggplot(predict_90,aes(x=mu)) +
  geom_density()+
  xlim(-3000,9000)+
  labs(title = "Typical Riders on 90% humid days")

#plot posterior predictive model of tmr's riders on a 90% humidity day
ggplot(predict_90,aes(x=y_new))+
  geom_density()+
  xlim(-3000,9000) +
  labs(title="Tomorrow's 90% day")
## Warning: Removed 3 rows containing non-finite values (stat_density).

This shows much higher certainty in the typical ridership days than in tomorrow’s prediction. In tomorrow’s prediction, this also shows values that could be negative, zero, or positive in the distribution shown.

  1. Calculate and interpret an 80% posterior prediction interval for the number of riders tomorrow
predict_90 |> 
  summarize(lower_mu = quantile(mu,0.10),
            upper_mu = quantile(mu,0.90),
            lower_new = quantile(y_new,0.10),
            upper_new = quantile(y_new,0.90))
##   lower_mu upper_mu lower_new upper_new
## 1 3114.097 3407.064  1223.489  5284.054

The 80% CI for the number of riders tomorrow is (1470.64, 5531.34). This means that we have 80% certainty that the number of riders on tomorrow’s 75 degree day will fall between 1471 and 5531. This is a much wider range than the mu values to the left.

  1. Use posterior_predict() to confirm the results from the posterior predictive model of tomorrow’s ridership:
#simulate a set of predictions
set.seed(369)
shortcut_prediction <- posterior_predict(humid_model_post,newdata=data.frame(humidity = 90))

#construct a 80% posterior CI
posterior_interval(shortcut_prediction,prob=0.80)
##        10%      90%
## 1 1223.489 5284.054
#plot the approximate predictive model
mcmc_dens(shortcut_prediction) +
  xlab("predicted ridership on tomorrow's 90% humidity day")

This looks very similar to the posterior model I created above! And the 80% CI is the same.

Question 9.14: on my own - part 1

Let’s now explore the relationship between ridership (Y) and windspeed (X)

  1. Describe a prior understanding of the relationship between ridership

To get some information about what these values could be I will look at the relationship of these variables in the overall dataset:

# data looking 

ggplot(bikes,aes(x=windspeed,y=rides))+
  geom_point()+
  geom_smooth(method="lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

Taking priors from the data, I can take an intercept value of around 4000 as the beta_0, mu and an sd of 1000. Then to get a vague sense of the slope it looks like the slope value is about -1000/15 = -66.7 as beta_1, mu and an sd = 100.

Then to get a prior on the sigma, the exponential parameter l is 1 / sigma. For sigma I will choose 2000 to capture a majority of the data within 2 standard deviations. This provides an l value of 1/2000 = 0.0005

  1. Tune the Normal regression model to match your prior understanding. Use careful notation to write out the complete Bayesian structure of this model

The Bayesian regression model is specified as shown:

data:

\(Y_i|\beta_0,\beta_i,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1X_i\)

priors:

\(\beta_{0c} \text{ ~ } N(4000,1000^2)\)

\(\beta_1 \text{ ~ } N(-66.7,100^2)\)

\(\sigma \text{ ~ } Exp(0.0005)\)

I’ll build this in rstanarm using 4 chains with 10000 runs like we usually do:

#specify the model

wind_model <- stan_glm(rides ~ windspeed, data = bikes,
                       family = gaussian,
                       prior_intercept = normal(4000,1000),
                       prior = normal(-66.7,100),
                       prior_aux = exponential(0.0005),
                       chains = 4, iter = 5000*2,seed = 369,
                       prior_PD = TRUE)
  1. Plot and discuss 100 prior plausible lines and 4 datasets simulated under the priors
#100 prior model lines

bikes |> 
  add_fitted_draws(wind_model,n=100) |> 
  ggplot(aes(x=windspeed,y=rides)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

#4 prior simulated datasets
set.seed(3)

bikes |> 
  add_predicted_draws(wind_model,n=4) |> 
  ggplot(aes(x=windspeed,y=rides))+
  geom_point(aes(y=.prediction,group=.draw))+
  facet_wrap(~.draw)
## Warning: 
## In add_predicted_draws(): The `n` argument is a deprecated alias for `ndraws`.
## Use the `ndraws` argument instead.
## See help("tidybayes-deprecated").

This shows a lot of different plausible lines that go all over the place. There is no clear relationship between rides and windspeed in this prior model. For the 4 datasets, those do seem to show a negative relationship with pretty limited variability.

  1. Construct and discuss a plot of rides v windspeed data. How consistent are the observed patterns with your prior understanding of this relationship?
ggplot(bikes,aes(x=windspeed,y=rides)) +
  geom_point(size=0.75)+
  geom_smooth(method="lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

This data shows a negative relationship and a pretty large spread of data points. This is consistent with the prior understanding in that the 4 simulated data sets all showed a negative relationship, but the plausible plots had a lot of variability.

Question 9.15: On your own - part II

Construct a posterior analysis of this relationship between ridership (Y) and windspeed (X).

To construct the posterior model I will update the prior model:

wind_model <- update(wind_model,prior_PD = FALSE)

Next, I will do the standard analytic and visual checks to make sure this model is mixing well:

neff_ratio(wind_model)
## (Intercept)   windspeed       sigma 
##     0.95990     0.97520     0.97285
rhat(wind_model)
## (Intercept)   windspeed       sigma 
##   1.0000899   0.9999894   1.0000395

These look good with neff ratios of near 1 and rhats near 1 less than 1.05.

Now for visuals:

#mcmc trace plot
mcmc_trace(wind_model)

#mcmc density
mcmc_dens_overlay(wind_model)

And these show that there is good mixing with no outliers, dips, and stuck points.

Now I will plot posterior plausible model lines, get a tidy summary and see the porportion of beta values under 0:

#plot posterior model lines

bikes |> 
  add_fitted_draws(wind_model,n=100) |> 
  ggplot(aes(x=windspeed,y=rides)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)+
  geom_point(data=bikes,size=0.05) +
  labs(title = "Posterior Windspeed + Rides Model")
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

#posterior summary stats
tidy(wind_model,effects=c("fixed","aux"),
     conf.int=TRUE,conf.level=0.95)
## # A tibble: 4 × 5
##   term        estimate std.error conf.low conf.high
##   <chr>          <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)   4210.      177.    3867.     4554. 
## 2 windspeed      -55.7      12.4    -79.6     -31.8
## 3 sigma         1548.       48.9   1456.     1648. 
## 4 mean_PPD      3484.       98.1   3291.     3676.
# store the chains in a dataframe
wind_model_df <- as.data.frame(wind_model)

#tabulate the beta_1 values that are less than 0

wind_model_df |> 
  mutate(under_0 = windspeed < 0) |> 
  tabyl(under_0)
##  under_0     n percent
##     TRUE 20000       1

All of the above steps give us information about the posterior model. Here we see that the relationship is negative between rides and windspeed looking at the visual representation in the plausible posterior plot lines. The 95% confidence interval for beta_1 is (-79.6 , -31.8) which is all negative, and when checked in analysis, we saw that all values of beta_1 were under 0, meaning that we have a lot of confidence in this negative relationship.

Question 9.16: Penguin - model building and simulation

Use the penguins data set to model the length of penguin flippers in mm (Y) by the length of their bills in mm (X). We have a general sense that this is somewhere between 150mm and 250mm long.

  1. Simulate the Normal regression prior model using 4 chains for 10000 iterations each

To build this model, we will mostly use weakly informative priors. But for the \(\beta_0\) we can use the given assumptions to designate a mu of 200 and sd = 25.

data("penguins_bayes")

penguin_model <- stan_glm(flipper_length_mm ~ bill_length_mm, data = penguins_bayes,
                          family = gaussian,
                          prior_intercept = normal(200,25),
                          prior = normal(0,2.5,autoscale=TRUE),
                          prior_aux = exponential(1,autoscale=TRUE),
                          prior_PD = TRUE,
                          chains = 4, iter = 10000, seed=369)
  1. Check the prior_summary() and use this to write out the complete structure of your normal regression model
prior_summary(penguin_model)
## Priors for model 'penguin_model' 
## ------
## Intercept (after predictors centered)
##  ~ normal(location = 200, scale = 25)
## 
## Coefficients
##   Specified prior:
##     ~ normal(location = 0, scale = 2.5)
##   Adjusted prior:
##     ~ normal(location = 0, scale = 6.4)
## 
## Auxiliary (sigma)
##   Specified prior:
##     ~ exponential(rate = 1)
##   Adjusted prior:
##     ~ exponential(rate = 0.071)
## ------
## See help('prior_summary.stanreg') for more details

With this we can specify the model below:

\(Y_i|\beta_0,\beta_i,\sigma \text{ ~ } N(\mu_i,\sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1X_i\)

priors:

\(\beta_0 \text{ ~ } N(200,25^2)\)

\(\beta_1 \text{ ~ } N(0,6.4^2)\)

\(\sigma \text{ ~ } Exp(0.071)\)

  1. Plot 100 prior plausible model lines and 4 datasets simulated under the priors

Note: I was getting an error message that NAs aren’t allowed in newdata, so I added a line to drop_NAs after calling the data set.

#100 prior model lines

penguins_bayes |> 
  drop_na() |> 
  add_fitted_draws(penguin_model,n=100) |> 
  ggplot(aes(x=bill_length_mm,y=flipper_length_mm)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

#4 prior simulated datasets
set.seed(3)

penguins_bayes |> 
  drop_na() |> 
  add_predicted_draws(penguin_model,n=4) |> 
  ggplot(aes(x=bill_length_mm,y=flipper_length_mm))+
  geom_point(aes(y=.prediction,group=.draw))+
  facet_wrap(~.draw)
## Warning: 
## In add_predicted_draws(): The `n` argument is a deprecated alias for `ndraws`.
## Use the `ndraws` argument instead.
## See help("tidybayes-deprecated").

  1. Summarize the weakly informative prior understanding of the relationship between flipper and bill length

This weakly informative prior is pretty weak. The prior plausible lines are truly all over the place, some negative and some positive. The 4 simulated datasets are in different directions and have variance in their variances.

Question 9.17: Penguins - data

  1. Plot and discuss the observed relationship between flipper and bill length
ggplot(penguins_bayes,aes(x=bill_length_mm,y=flipper_length_mm)) +
  geom_point(size=0.75)+
  geom_smooth(method="lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

This shows a fairly strong positive relationship between the bill and flipper length for the sampled penguins.

  1. Does simple normal regression seem to be a reasonable approach to modeling this relationship?

Yes, this relationship seems appropriately linear when we look at the actual data.

Question 9.18: Penguins - posterior analysis

  1. Use stan_glm() to simulate the Normal regression posterior model

To do this, I will update the model I already made

penguin_model <- update(penguin_model,prior_PD = FALSE)
  1. Plot 100 posterior model lines for the relationship
#100 prior model lines

penguins_bayes |> 
  drop_na() |> 
  add_fitted_draws(penguin_model,n=100) |> 
  ggplot(aes(x=bill_length_mm,y=flipper_length_mm)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

  1. Provide a tidy() summary of the posterior model, including a 90% CI
#posterior summary stats
tidy(penguin_model,effects=c("fixed","aux"),
     conf.int=TRUE,conf.level=0.9)
## # A tibble: 4 × 5
##   term           estimate std.error conf.low conf.high
##   <chr>             <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)      127.       4.74    119.      134.  
## 2 bill_length_mm     1.69     0.107     1.52      1.86
## 3 sigma             10.6      0.407    10.0      11.3 
## 4 mean_PPD         201.       0.808   200.      202.
  1. Interpret the 90% CI for the \(\beta_1\)

This is showing a 90% CI for \(\beta_1\) as (1.5, 1.9). This is a pretty narrow CI that is completely above zero. This gives us some confidence that the relationship between bill and flipper length is positive and that a single unit increase in bill length produces an increase in flipper length of between 1.5 and 1.9.

  1. Do we have ample posterior evidence that penguins with longer bills tend to have longer flippers?

Yes. We have visual evidence that all of the plausible lines are in the positive direction. We have evidence in that the 90% CI is a range with values above 0. And we can check mathematically below:

# store the chains in a dataframe
penguin_model_df <- as.data.frame(penguin_model)

#tabulate the beta_1 values that are less than 0

penguin_model_df |> 
  mutate(exceeds_0 = bill_length_mm > 0) |> 
  tabyl(exceeds_0)
##  exceeds_0     n percent
##       TRUE 20000       1

This shows that all values exceed 0 (are positive). This gives us further evidence that this relationship is positive.

Question 9.19: Penguins - Prediction

A researcher comes across Pablo the penguin. They are able to ascertain that Pablo’s bill is 51mm long but can’t measure the flipper

  1. Without the posterior_predict() shortcut function, simulate the posterior model for the typical flipper length among penguins with 51mm bills as well as the posterior predictive model for Pablo’s flipper length

I will do this drawing 20k samples from the posterior model that I constructed above:

#predict flipper length for each parameter set in the chain

set.seed(369)
predict_51 <- penguin_model_df |> 
  mutate(mu = `(Intercept)` + bill_length_mm*51,
         y_new = rnorm(20000,mean=mu,sd=sigma))

head(predict_51,3)
##   (Intercept) bill_length_mm    sigma       mu    y_new
## 1    126.3104       1.715876 10.81434 213.8201 205.7024
## 2    139.0349       1.396766 11.06823 210.2699 220.2054
## 3    120.5937       1.851266 11.20938 215.0082 223.3588

The mu value provides the posterior for a typical flipper length for penguins with bills of 51 mm whereas the y_new provides the predictive model for the flipper length of Pablo.

  1. Construct, discuss, and compare density plots for the posterior models above:
#plot the posterior model of the typical flipper length for penguins with bills of 51mm
ggplot(predict_51,aes(x=mu)) +
  geom_density()+
  xlim(150,300)+
  labs(title = "Typical Flipper Length for Penguins with 51mm Bills")

#plot posterior predictive model of Pablo's flipper length
ggplot(predict_51,aes(x=y_new))+
  geom_density()+
  xlim(150,300) +
  labs(title="Pablo's Predicted Flipper Length")

This again shows that the distribution for mu is much more narrow for the typical length of penguin flippers than the y_new distribution for the prediction of Pablo’s flipper length.

  1. Calculate and interpret an 80% posterior prediction interval for Pablo’s flipper length
predict_51 |> 
  summarize(lower_mu = quantile(mu,0.10),
            upper_mu = quantile(mu,0.90),
            lower_new = quantile(y_new,0.10),
            upper_new = quantile(y_new,0.90))
##   lower_mu upper_mu lower_new upper_new
## 1 211.6732 214.0699  199.0659  226.5565

The 80% CI for Pablo’s flipper length is (199.1, 226.6). This means that we have 80% certainty that Pablo’s flipper length falls between 199 and 226 mm.

  1. Would the 80% CI for the typical flipper length among all penguins with 51mm bills be wider and narrower?

The CI for typical flipper length would be narrower. Predicting the typical value we can have more confidence in where the average value lies. However, when predicting any single case, we have to take in to account the individual variance that any one subject might have thus widening the range of plausible values.

  1. Use posterior_predict() to confirm your results to parts b and c
#simulate a set of predictions
set.seed(369)
shortcut_prediction <- posterior_predict(penguin_model,newdata=data.frame(bill_length_mm = 51))

#construct a 80% posterior CI
posterior_interval(shortcut_prediction,prob=0.80)
##        10%      90%
## 1 199.0659 226.5565
#plot the approximate predictive model
mcmc_dens(shortcut_prediction) +
  xlab("Predicted Flipper Length of a 51mm Bill Penguin")

This posterior model looks very similar to the one I generated above and the 80% credible interval is the same.

Question 9.20: More Penguins

Instead of bill length, consider the normal regression model of penguin flipper length (Y) by body mass in grams (X)

  1. Based on their study, support researchers are certain about the relationship between flipper length and body mass. \(\beta_1 \text{ ~ } N(0.01,0.002^2)\). Describe their prior understanding

The researchers define a prior distribution for \(\beta_1\) with a mean of 0.01 and sd = 0.002. This means that on average, a one unit increase in body mass in grams results in a 0.01 increase in flipper length in mm. This conveys a positive relationship between the two. The sd is pretty small which conveys their high certainty.

  1. Plot and discuss the observed relationship between flipper length and body mass
ggplot(penguins_bayes,aes(x=body_mass_g,y=flipper_length_mm))+
  geom_point()+
  geom_smooth(method="lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

This does show a positive relationship between body mass and flipper length in the data. This also shows that the points are pretty tight to the line, meaning that the SD should be small.

  1. In a simple Normal regression model of flipper length Y by one predictor X, do you think that the \(\sigma\) parameter is bigger when X = bill length or when X = body mass?

I would expect the sigma to be bigger for X = bill length. This is because I see a clear relationship between body mass and length of flipper, which is part of the physical body, whereas I can see much more variation occurring in the length of the bill, where large birds have small bills and small birds have large bills.

According to our previous investigations, we saw that the spread of data in the data plot for flipper and bill length was wider than this data plot of body mass and flipper length.

  1. Use stan_glm() to simulate the Normal regression posterior model of flipper length by body mass using the researchers’ prior for beta_1 and weakly informative priors of beta_0c and sigma. Use 4 chains and 10000 iterations each
penguin_2 <- stan_glm(flipper_length_mm ~ body_mass_g, data = penguins_bayes,
                      family = gaussian,
                      prior_intercept = normal(0,2.5,autoscale=TRUE),
                      prior = normal(0.01,0.002),
                      prior_aux = exponential(1,autoscale=TRUE),
                      chains = 4, iter = 5000*2,seed=369)
  1. Plot the posterior model of beta_1 coefficient. Use this to describe the researchers’ posterior understanding of the relationship between flippers and mass and how its evolved from the prior understanding
#100 prior model lines

penguins_bayes |> 
  drop_na() |> 
  add_fitted_draws(penguin_2,n=100) |> 
  ggplot(aes(x=body_mass_g,y=flipper_length_mm)) +
  geom_line(aes(y=.value,group=.draw),alpha=0.15)
## Warning: `fitted_draws` and `add_fitted_draws` are deprecated as their names were confusing.
## Use [add_]epred_draws() to get the expectation of the posterior predictive.
## Use [add_]linpred_draws() to get the distribution of the linear predictor.
## For example, you used [add_]fitted_draws(..., scale = "response"), which
## means you most likely want [add_]epred_draws(...).

#posterior summary
tidy(penguin_2,effects=c("fixed","aux"),
     conf.int=TRUE,conf.level=0.95)
## # A tibble: 4 × 5
##   term        estimate std.error conf.low conf.high
##   <chr>          <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept) 138.      1.93     134.      142.    
## 2 body_mass_g   0.0150  0.000455   0.0141    0.0159
## 3 sigma         6.93    0.267      6.44      7.48  
## 4 mean_PPD    201.      0.526    200.      202.

This posterior model shows that the relationship between body mass and flipper length is strongly positive. All the plausible posterior lines are strongly positive and there is very little variability. The 95% CI for the beta_1 is (0.014, 0.016) meaning for every 1 unit increase in body mass (1 gram) there is between 0.014 and 0.016 increase in flipper length mm and there is very little variability in there. This model looks very similar to the prior model where the mean was 0.01 and sd = 0.002. Here, the intercept value is 0.015 and the SE is 0.0005.